• Disclaimer: This tool is designed for research and is not intended for diagnostic purpose.
  • Version of TRexs: 0.3
  • Number of disease samples: 6
  • Number of control samples: 118
  • Number of repeats genotyped: 56

Distribution of repeat expansion relative to HPRC controls

  • Mouse over each point for detailed sample information,
  • Mouse over color block for disease-related information.
  • Left click to draw rectangle to zoom into any region on the plot.
  • Note that number of copies here are sum of all expanded motifs and are not specific to motifs known to cause disease.
  • By default, normal, premutation and pathogenic thresholds are curated using the DRED database as a base. Developer may manually curate and modify the threshold to the best of his/her knowledge.
  • Transparency of the points for diseased cohort corresponds to the isofor_score which is the method used to detect outlier based on allelic length (regardless of motifs). The higher the score is, the rarer the sample’s repeats expansion is.

Set 1 of genes

Set 2 of genes

Set 3 of genes

Set 4 of genes

Table of potentially significant repeats

  • Curated repeat expansions with high prevalence not shown here:
  • Top row of table can be used to filter the samples. For example type “Yes” in the column “Pathogenic motif high?” to look for samples with expanded motifs known to cause disease (Note that this will remove samples with novel motifs).
  • You can also use expression like “>10” in numerical columns to filter based on cut-offs.
  • This table is generated based on the following logics:
    • MC tags from trgt output is used to get total number of copies on each allele (e.g. “50_50” represent 50 copies for each motif genotyped, and this will sum to 100).
    • All samples are filtered if maximum number of copies expanded by any motif exceed known pathogenic threshold. This means, if there’s 1000 copies of TAAAA, we will output this even though it’s not the known pathogenic motifs.
    • For the remaining samples, we further filter based on known inheritance patterns. E.g. in RFC1 both allele needs to be expanded more than the pathogenic threshold.
    • However, if known pathogenic motifs are found in sample, it’s tested specifically for that motif to produce “Pathogenic motif high?”. E.g. in BEAN1 we test specifically if TGGAA has more copies than known threshold.
    • Finally, if we see >10% or n>=5 control samples have copies more than pathogenic threshold, we check if the disease samples are expanded more than the minimum copies in the group of control samples with high expansion, while considering inheritance pattern similar to above.
    • If there’s more than 10 samples being looked at, we filter out any genes whereby more than half of the samples are expanded in the repeats as these are likely not disease-causing.
    • Note that the genotyped motif for some repeats may differ from conventional testing. E.g. ATXN2 is genotyped with GCT instead of the usual CAG, so there may be an offset of 1 copy.
  • Outlier score is determined using isolation forest (package isotree with a default of n=100 trees).

TRVZ Visualization of Potentially Pathogenic Repeats

  • By default, only those with matching pathogenic motifs (“Pathogenic Motif High” column above) will be visualized here.

HG01981: ATXN10

HG01981 ATXN10

Parameters

  • show_high_prev_gene : FALSE
  • version : 0.3
  • sample_sheet : /home/kpin/pb_bitbucket/TRexs/control_vcf/sample_sheet_HG001-4.tsv
  • control_tsv : /home/kpin/pb_bitbucket/TRexs/resources/control_samples_repeat_2022-10-20.tsv.gz
  • trvz_binary : /home/kpin/softwares/trgt/trvz/target/release/trvz
  • pathogenic_bed : /home/kpin/pb_bitbucket/TRexs/resources/pathogenic_repeats.hg38.bed
  • hg38 : /nrt-data/downstream/variants_calling/2022-2-6_KKH_neuro_10samples/reference/GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta
  • repeats_db : /home/kpin/pb_bitbucket/TRexs/resources/repeats_information.tsv
  • high_prev_genes :
  • odir : /home/kpin/pb_bitbucket/TRexs/control_vcf/trgt_report_v0.1.1_2022-10-20